question type
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.62)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Asia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Israel (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
Chang, Eun, Huang, Zhuangqun, Liao, Yiwei, Bhavsar, Sagar Ravi, Param, Amogh, Stark, Tammy, Ahmadyan, Adel, Yang, Xiao, Wang, Jiaqi, Abdullah, Ahsan, Nguyen, Giang, Iyer, Akil, Hall, David, Li, Elissa, Moon, Shane, Scheffer, Nicolas, Ahmed, Kirmani, Damavandi, Babak, Wanga, Rakesh, Kumar, Anuj, Patel, Rohit, Dong, Xin Luna
We introduce WearVQA, the first benchmark specifically designed to evaluate the Visual Question Answering (VQA) capabilities of multi-modal AI assistant on wearable devices like smart glasses. Unlike prior benchmarks that focus on high-quality, third-person imagery, WearVQA reflects the unique challenges of egocentric interaction--where visual inputs may be occluded, poorly lit, unzoomed, or blurry, and questions are grounded in realistic wearable use cases. The benchmark comprises 2,520 carefully curated image-question-answer triplets, spanning 7 diverse image domains including both text-centric and general scenes, 10 cognitive task types ranging from basic recognition to various forms of reasoning, and 6 common wearables-specific image quality issues. All questions are designed to be answerable using only the visual input and common senses. WearVQA is paired with a rigorous LLM-as-a-judge evaluation framework with 96% labeling accuracy. Open-source and proprietary multi-modal LLMs achieved a QA accuracy as low as 24-52% on WearVQA, with substantial drops on lower-quality images and reasoning-heavy tasks.
- Information Technology > Security & Privacy (0.46)
- Information Technology > Hardware (0.35)
Comparative Analysis of 47 Context-Based Question Answer Models Across 8 Diverse Datasets
Muneeb, Muhammad, Ascher, David B., Bakht, Ahsan Baidar
Context-based question answering (CBQA) models provide more accurate and relevant answers by considering the contextual information. They effectively extract specific information given a context, making them functional in various applications involving user support, information retrieval, and educational platforms. In this manuscript, we benchmarked the performance of 47 CBQA models from Hugging Face on eight different datasets. This study aims to identify the best-performing model across diverse datasets without additional fine-tuning. It is valuable for practical applications where the need to retrain models for specific datasets is minimized, streamlining the implementation of these models in various contexts. The best-performing models were trained on the SQuAD v2 or SQuAD v1 datasets. The best-performing model was ahotrod/electra_large_discriminator_squad2_512, which yielded 43\% accuracy across all datasets. We observed that the computation time of all models depends on the context length and the model size. The model's performance usually decreases with an increase in the answer length. Moreover, the model's performance depends on the context complexity. We also used the Genetic algorithm to improve the overall accuracy by integrating responses from other models. ahotrod/electra_large_discriminator_squad2_512 generated the best results for bioasq10b-factoid (65.92\%), biomedical\_cpgQA (96.45\%), QuAC (11.13\%), and Question Answer Dataset (41.6\%). Bert-large-uncased-whole-word-masking-finetuned-squad achieved an accuracy of 82\% on the IELTS dataset.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Oceania > Australia > Queensland (0.04)
- Europe > Austria (0.04)
- Research Report > Experimental Study (0.94)
- Research Report > New Finding (0.68)
- Health & Medicine (1.00)
- Energy (0.68)
- Education (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Comparing verbal, visual and combined explanations for Bayesian Network inferences
Nyberg, Erik P., Mascaro, Steven, Zukerman, Ingrid, Wybrow, Michael, Vo, Duc-Minh, Nicholson, Ann
Bayesian Networks (BNs) are an important tool for assisting probabilistic reasoning, but despite being considered transparent models, people have trouble understanding them. Further, current User Interfaces (UIs) still do not clarify the reasoning of BNs. To address this problem, we have designed verbal and visual extensions to the standard BN UI, which can guide users through common inference patterns. We conducted a user study to compare our verbal, visual and combined UI extensions, and a baseline UI. Our main findings are: (1) users did better with all three types of extensions than with the baseline UI for questions about the impact of an observation, the paths that enable this impact, and the way in which an observation influences the impact of other observations; and (2) using verbal and visual modalities together is better than using either modality alone for some of these question types.
- North America > United States > Massachusetts > Middlesex County > Lowell (0.14)
- Oceania > Australia (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (11 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (1.00)
- Education (1.00)
Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy
Hu, Desheng, Baumann, Joachim, Urman, Aleksandra, Lichtenegger, Elsa, Forsberg, Robin, Hannak, Aniko, Wilson, Christo
Google Search increasingly surfaces AI-generated content through features like AI Overviews (AIO) and Featured Snippets (FS), which users frequently rely on despite having no control over their presentation. Through a systematic algorithm audit of 1,508 real baby care and pregnancy-related queries, we evaluate the quality and consistency of these information displays. Our robust evaluation framework assesses multiple quality dimensions, including answer consistency, relevance, presence of medical safeguards, source categories, and sentiment alignment. Our results reveal concerning gaps in information consistency, with information in AIO and FS displayed on the same search result page being inconsistent with each other in 33% of cases. Despite high relevance scores, both features critically lack medical safeguards (present in just 11% of AIO and 7% of FS responses). While health and wellness websites dominate source categories for both, AIO and FS, FS also often link to commercial sources. These findings have important implications for public health information access and demonstrate the need for stronger quality controls in AI-mediated health information. Our methodology provides a transferable framework for auditing AI systems across high-stakes domains where information quality directly impacts user well-being.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Missouri > Jackson County > Kansas City (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study > Negative Result (0.46)
- Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
- Health & Medicine > Public Health (1.00)
- Health & Medicine > Consumer Health (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)